A Medium-Grained Algorithm for Distributed Sparse Tensor Factorization
نویسندگان
چکیده
Modeling multi-way data can be accomplished using tensors, which are data structures indexed along three or more dimensions. Tensors are increasingly used to analyze extremely large and sparse multi-way datasets in life sciences, engineering, and business. The canonical polyadic decomposition (CPD) is a popular tensor factorization for discovering latent features and is most commonly found via the method of alternating least squares (CPD-ALS). The computational time and memory required to compute CPD limits the size and dimensionality of the tensors that can be solved on a typical workstation, making distributed solution approaches the only viable option. Most methods for distributed-memory systems have focused on distributing the tensor in a coarse-grained, one-dimensional fashion that prohibitively requires the dense matrix factors to be fully replicated on each node. Recent work overcomes this limitation by using a fine-grained decomposition of the tensor nonzeros, at the cost of computationally expensive hypergraph partitioning. To that effect, we present a medium-grained decomposition that avoids complete factor replication and communication, while eliminating the need for expensive pre-processing steps. We use a hybrid MPI+OpenMP implementation that exploits multi-core architectures with a low memory footprint. We theoretically analyze the scalability of the coarse-, medium-, and fine-grained decompositions and experimentally compare them across a variety of datasets. Experiments show that the medium-grained decomposition reduces communication volume by 36-90% compared to the coarse-grained decomposition, is 41-76x faster than a state-ofthe-art MPI code, and is 1.5-5.0x faster than the fine-grained decomposition with 1024 cores. Keywords-Sparse tensor, distributed, PARAFAC, CPD, parallel, medium-grained
منابع مشابه
LFTF: A Framework for Efficient Tensor Analytics at Scale
Tensors are higher order generalizations of matrices to model multi-aspect data, e.g., a set of purchase records with the schema (user id, product id, timestamp, feedback). Tensor factorization is a powerful technique for generating a model from a tensor, just like matrix factorization generates a model from a matrix, but with higher accuracy and richer information as more attributes are availa...
متن کاملVoice-based Age and Gender Recognition using Training Generative Sparse Model
Abstract: Gender recognition and age detection are important problems in telephone speech processing to investigate the identity of an individual using voice characteristics. In this paper a new gender and age recognition system is introduced based on generative incoherent models learned using sparse non-negative matrix factorization and atom correction post-processing method. Similar to genera...
متن کاملDMS: Distributed Sparse Tensor Factorization with Alternating Least Squares
Tensors are data structures indexed along three or more dimensions. Tensors have found increasing use in domains such as data mining and recommender systems where dimensions can have enormous length and are resultingly very sparse. The canonical polyadic decomposition (CPD) is a popular tensor factorization for discovering latent features and is most commonly found via the method of alternating...
متن کاملA social recommender system based on matrix factorization considering dynamics of user preferences
With the expansion of social networks, the use of recommender systems in these networks has attracted considerable attention. Recommender systems have become an important tool for alleviating the information that overload problem of users by providing personalized recommendations to a user who might like based on past preferences or observed behavior about one or various items. In these systems...
متن کاملDistributed Flexible Nonlinear Tensor Factorization for Large Scale Multiway Data Analysis
Tensor factorization is an important approach to multiway data analysis. However, real-world tensor data often encompass complex interactions among tensor elements, and are extremely sparse and of huge size. Despite the success of exiting approaches, they are either not powerful enough to model the complex interactions or extreme sparsity in data. To overcome these limits, we propose a new tens...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016